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Abstract 



We focus on the problem of performing random walks efficiently in a distributed network. Given 
bandwidth constraints, the goal is to minimize the number of rounds required to obtain a random walk 
sample. We first present a fast subhnear time distributed algorithm for performing random walks whose 
time complexity is sublinear in the length of the walk. Our algorithm performs a random walk of length 
(. in 0{VJD) rounds (with high probability) on an undirected network, where D is the diameter of the 
network. This improves over the previous best algorithm that ran in 0(£^/^D^/^) rounds (Das Sarma et 
al., PODC 2009). We further extend our algorithms to efficiently perform k independent random walks in 
0{VktD + k) rounds. We then show that there is a fundamental difficulty in improving the dependence 



on i any further by proving a lower bound of + D) under a general model of distributed 

random walk algorithms. Our random walk algorithms are useful in speeding up distributed algorithms 
for a variety of applications that use random walks as a subroutine. We present two main applications. 
First, we give a fast distributed algorithm for computing a random spanning tree (RST) in an arbitrary 
(undirected) network which runs in 0{y/mD) rounds (with high probability; here m is the number of 
edges). Our second appUcation is a fast decentralized algorithm for estimating mixing time and related 
parameters of the underlying network. Our algorithm is fuUy decentraUzed and can serve as a building 
block in the design of topologically-aware networks. 

Keywords: Random walks, Random sampling, Decentralized computation, Distributed algorithms. Random 
Spanning Tree, Mixing Time. 
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1 Introduction 

Random walks play a central role in computer science, spanning a wide range of areas in both theory and 
practice. The focus of this paper is random walks in networks, in particular, decentralized algorithms for 
performing random walks in arbitrary networks. Random walks are used as an integral subroutine in a 
wide variety of network applications ranging from token management and load balancing to search, routing, 
information propagation and gathering, network topology construction and building random spanning trees 
(e.g., see 1.11.1 and the references therein). Random walks are also very useful in providing uniform and 
efficient solutions to distributed control of dynamic networks 1111331. Random walks are local and lightweight 
and require little index or state maintenance which make them especially attractive to self-organizing dynamic 
networks such as Internet overlay and ad hoc wireless networks. 

A key purpose of random walks in many of these network applications is to perform node sampling. 
While the sampling requirements in different applications vary, whenever a true sample is required from a 
random walk of certain steps, typically all applications perform the walk naively — by simply passing a 
token from one node to its neighbor: thus to perform a random walk of length i takes time linear in i. 

In this paper, we present a sublinear time (sublinear in £) distributed random walk sampling algorithm 
that is significantly faster than the previous best result. Our algorithm runs in time 0(\/ZD) rounds. We 
then present an almost matching lower bound that applies to a general class of distributed algorithms (our 
algorithm also falls in this class). Finally, we present two key applications of our algorithm. The first is a 
fast distributed algorithm for computing a random spanning tree, a fundamental spanning tree problem that 
has been studied widely in the classical setting (see e.g., |[T9l and references therein). To the best of our 
knowledge, our algorithm gives the fastest known running time in an arbitrary network. The second is to 
devising efficient decentralized algorithms for computing key global metrics of the underlying network — 
mixing time, spectral gap, and conductance. Such algorithms can be useful building blocks in the design of 
topologically (self-)aware networks, i.e., networks that can monitor and regulate themselves in a decentral- 
ized fashion. For example, efficiently computing the mixing time or the spectral gap, allows the network to 
monitor connectivity and expansion properties of the network. 

1.1 Distributed Computing Model 

Consider an undirected, unweighted, connected n-node graph G = {V, E). Suppose that every node (vertex) 
hosts a processor with unbounded computational power, but with limited initial knowledge. Specifically, 
assume that each node is associated with a distinct identity number from the set { 1 , 2 , . . . , n} . At the beginning 
of the computation, each node v accepts as input its own identity number and the identity numbers of its 
neighbors in G. The node may also accept some additional inputs as specified by the problem at hand. The 
nodes are allowed to communicate through the edges of the graph G. The communication is synchronous, 
and occurs in discrete pulses, called rounds. In particular, all the nodes wake up simultaneously at the 
beginning of round 1 , and from this point on the nodes always know the number of the current round. In each 
round each node v is allowed to send an arbitrary message of size O(logn) through each edge e = (f , u) 
that is adjacent to v, and the message will arrive to u at the end of the current round. This is a standard model 
of distributed computation known as the CONGEST model |[29l and has been attracting a lot of research 
attention during last two decades (e.g., see 1291 and the references therein). 

There are several measures of efficiency of distributed algorithms, but we will concentrate on one of 
them, specifically, the running time, that is, the number of rounds of distributed communication. (Note 
that the computation that is performed by the nodes locally is "free", i.e., it does not affect the number of 
rounds.) Many fundamental network problems such as minimum spanning tree, shortest paths, etc. have been 
addressed in this model (e.g., see l24l[29ll28l ). In particular, there has been much research into designing 
very fast distributed approximation algorithms (that are even faster at the cost of producing sub-optimal 
solutions) for many of these problems (see e.g., lT3l [121 l22l 1211 ). Such algorithms can be useful for large- 
scale resource-constrained and dynamic networks where running time is crucial. 
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1.2 Problem Statement, Motivation, and Related Work 

The basic problem we address is the following. We are given an arbitrary undirected, unweighted, and 
connected n-node network G = {V,E) and a (source) node s ^ V. The goal is to devise a distributed 
algorithm such that, in the end, s outputs the ID of a node v which is randomly picked according to the 
probability that it is the destination of a random walk of length I starting at s. Throughout this paper, we 
assume the standard (simple) random walk: in each step, an edge is taken from the current node x with 
probability proportional to 1/ d{x) where d{x) is the degree of x. Our goal is to output a true random sample 
from the £-walk distribution starting from s. 

For clarity, observe that the following naive algorithm solves the above problem in 0{€) rounds: The 
walk of length £ is performed by sending a token for £ steps, picking a random neighbor with each step. 
Then, the destination node v of this walk sends its ID back (along the same path) to the source for output. 
Our goal is to perform such sampling with significantly less number of rounds, i.e., in time that is sublinear in 
£. On the other hand, we note that it can take too much time (as much as 6(|i?| + D) time) in the CONGEST 
model to collect all the topological information at the source node (and then computing the walk locally). 

This problem was proposed in [ 1 1 1 under the name Computing One Random Walk where Source Outputs 
Destination (1-RW-SoD) (for short, this problem will be simply called Single Random Walk in this paper), 
wherein the first sublinear time distributed algorithm was provided, requiring 0{i'^^^D^^^) rounds (O hides 
polylog(n) factors); this improves over the naive 0{£) algorithm when the walk is long compared to the 
diameter (i.e., £ = r2(L)polylogn) where D is the diameter of the network). This was the first result to 
break past the inherent sequential nature of random walks and beat the naive £ round approach, despite the 
fact that random walks have been used in distributed networks for long and in a wide variety of applications. 

There are two key motivations for obtaining sublinear time bounds. The first is that in many algorithmic 
applications, walks of length significantly greater than the network diameter are needed. For example, this is 
necessary in both the applications presented later in the paper, namely distributed computation of a random 
spanning tree (RST) and computation of mixing time. In the RST algorithm, we need to perform a random 
walk of expected length 0{mD) (where m is the number of edges in the network). In decentralized com- 
putation of mixing time, we need to perform walks of length at least equal to the mixing time which can be 
significantly larger than the diameter (e.g., in a random geometric graph model |[27l . a popular model for ad 
hoc networks, the mixing time can be larger than the diameter by a factor of Q{y/n).) More generally, many 
real-world communication networks (e.g., ad hoc networks and peer-to-peer networks) have relatively small 
diameter, and random walks of length at least the diameter are usually performed for many sampling appli- 
cations, i.e., £ » D.\t should be noted that if the network is rapidly mixing/expanding which is sometimes 
the case in practice, then sampling from walks of length ^ > > D is close to sampling from the steady state 
(degree) distribution; this can be done in 0{D) rounds (note however, that this gives only an approximately 
close sample, not the exact sample for that length). However, such an approach fails when £ is smaller than 
the mixing time. 

The second motivation is understanding the time complexity of distributed random walks. Random walk 
is essentially a global problem which requires the algorithm to "traverse" the entire network. Classical 
"global" problems include the minimum spanning tree, shortest path etc. Network diameter is an inherent 
lower bound for such problems. Problems of this type raise the basic question whether n (or £ as the case 
here) time is essential or is the network diameter D, the inherent parameter. As pointed out in the seminal 
work of [TSl, in the latter case, it would be desirable to design algorithms that have a better complexity for 
graphs with low diameter. 

The high-level idea used in the 0{£^/^D^/^)-m\md algorithm in ifTTl is to "prepare" a few short walks 
in the beginning (executed in parallel) and then carefully stitch these walks together later as necessary. The 
same general approach was introduced in [ 10] to find random walks in data streams with the main motivation 
of finding PageRank. However, the two models have very different constraints and motivations and hence 
the subsequent techniques used in lITTi and ifTOl are very different. 
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Recently, Sami and Twigg fST] consider lower bounds on the communication complexity of computing 
stationary distribution of random walks in a network. Although, their problem is related to our problem, the 
lower bounds obtained do not imply anything in our setting. Other recent works involving multiple random 
walks in different settings include Alon et. al. |3l, and Cooper et al. Q. 

1.3 Our Results 

• A Fast Distributed Random Walk Algorithm: We present a sublinear, almost time-optimal, distributed 
algorithm for the single random walk problem in arbitrary networks that runs in time 0{VJD), where £ is 
the length of the walk (cf. Section |2]l. This is a significant improvement over the naive ^-round algorithm 
for i = Q{D) as well as over the previous best running time of 0{£'^^^D^/^) 111.1 . The dependence on i 
is reduced from to 

Our algorithm in this paper uses an approach similar to that of fTPl but exploits certain key properties of 
random walks to design an even faster sublinear time algorithm. Our algorithm is randomized (Las Vegas 
type, i.e., it always outputs the correct result, but the running time claimed is with high probability) and 
is conceptually simpler compared to the 0(^^/^D^/'^)-round algorithm (whose running time is determin- 
istic). While the previous (slower) algorithm [ 1 1] applies to the more general Metropolis-Hastings walk, 
in this work we focus primarily on the simple random walk for the sake of obtaining the best possible 
bounds in this commonly used setting. 

One of the key ingredients in the improved algorithm is proving a bound on the number of times any 
node is visited in an ^-length walk, for any length i = 0{m?). We show that w.h.p. any node x is visited 
at most 0{d{x)^f^) times, in an ^-length walk from any starting node {d{x) is the degree of x). We then 
show that if only certain i/\ special points of the walk (called as connector points) are observed, then any 
node is observed only 0{d{x)Vi/ X) times. The algorithm starts with all nodes performing short walks 
(of length uniformly random in the range A to 2 A for appropriately chosen A) efficiently simultaneously; 
here the randomly chosen lengths play a crucial role in arguing about a suitable spread of the connector 
points. Subsequently, the algorithm begins at the source and carefully stitches these walks together till i 
steps are completed. 

We also extend to give algorithms for computing k random walks (from any k sources — not neces- 
sarily distinct) in O (^mm{'\/ kiD + k,k + l)^ rounds. Computing k random walks is useful in many 
applications such as the one we present below on decentr alized computation of mixing time and related 
parameters. While the main requirement of our algorithms is to just obtain the random walk samples 
(i.e. the end point of the I step walk), our algorithms can regenerate the entire walks such that each node 
knows its position(s) among the £ steps. Our algorithm can be extended to do this in the same number of 
rounds. 

• A Lower Bound: We establish an almost matching lower bound on the running time of distributed 
random walk that applies to a general class of distributed random walk algorithms. We show that any 

algorithm belonging to the class needs at least ^{\J^^ + D) rounds to perform a random walk of length 
t, notice that this lower bound is nontrivial even in graphs of small {D = 0(log n)) diameter (cf. Section 
[3]). Broadly speaking, we consider a class of token forwarding-type algorithms where nodes can only 
store and (selectively) forward tokens (here tokens are 0(logn)-sized messages consisting of two node 
ids identifying the beginning and end of a segment — we make this more precise in Section [3]l. Selective 
forwarding (more general than just store and forwarding) means that nodes can omit to forward certain 
segments (to reduce number of messages), but they cannot alter tokens in any way (e.g., resort to data 
compression techniques). This class includes many natural algorithms, including the algorithm in this 
paper. 

Our technique involves showing the same non-trivial lower bound for a problem that we call path veri- 
fication. This simpler problem appears quite basic and can have other applications. Informally, given a 
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graph G and a sequence of i vertices in the graph, the problem is for some (source) node in the graph 
to verify that the sequence forms a path. One main idea in this proof is to show that independent nodes 
may be able to verify short local paths; however, to be able to merge these together and verify an ^-length 
path would require exchanging several messages. The trade-off is between the lengths of the local paths 
that are verified and the number of such local paths that need to be combined. Locally verified paths can 
be exchanged in one round, and messages can be exchanged at all nodes. Despite this, we show that the 
bandwidth restriction necessitates a large number of rounds even if the diameter is small. We then show 
a reduction to the random walk problem, where we require that each node in the walk should know its 
(correct) position(s) in the walk. 

Similar non-trivial matching lower bounds on running time are known only for a few important problems 
in distributed computing, notably the minimum spanning tree problem (e.g., see |[30l |T4'1). Peleg and 
Rabinovich |[30l showed that Cl{^/n) time is required for constructing an MST even on graphs of small 
diameter (for any D = O(logn)) and 1231 showed an essentially matching upper bound. 

Applications: Our faster distributed random walk algorithm can be used in speeding up distributed 
applications where random walks arise as a subroutine. Such applications include distributed construction 
of expander graphs, checking whether a graph is an expander, construction of random spanning trees, and 
random-walk based search (we refer to ifTTI for details). Here, we present two key applications: 

(1) A Fast Distributed Algorithm for Random Spanning Trees (RST): We give a 0{y/rnD) time distributed 
algorithm (cf. Section |4.1| ) for uniformly sampling a random spanning tree in an arbitrary undirected 
(unweighted) graph (i.e., each spanning tree in the underlying network has the same probability of being 
selected), (m denotes the number of edges in the graph.) Spanning trees are fundamental network 
primitives and distributed algorithms for various types of spanning trees such as minimum spanning tree 
(MST), breadth-first spanning tree (BFS), shortest path tree, shallow-light trees etc., have been studied 
extensively in the literature [29.1. However, not much is known about the distributed complexity of the 
random spanning tree problem. The centralized case has been studied for many decades, see e.g., the 
recent work of [ 19] and the references therein; also see the recent work of Goyal et al. IITtII which gives 
nice applications of RST to fault-tolerant routing and constructing expanders. In the distributed context, 
the work of Bar-Ilan and Zernik |5| give a distributed RST algorithm for two special cases, namely that 
of a complete graph (running in constant time) and a synchronous ring (running in 0(n) time). The work 
of H give a self-stablizing distributed algorithm for constructing a RST in a wireless ad hoc network and 
mentions that RST is more resilient to transient failures that occur in mobile ad hoc networks. 

Our algorithm works by giving an efficient distributed implementation of the well-known Aldous-Broder 
random walk algorithm LL. J I for constructing a RST. 

(2) Decentralized Computation of Mixing Time. We present a fast decentralized algorithm for estimat- 



ing mixing time, conductance and spectral gap of the network (cf. 4.2 1. In particular, we show that 
given a starting point x, the mixing time with respect to x, called r^j^, can be estimated in 0(n^/^ + 
n^/^Y^Dr^j^.) rounds. This gives an alternative algorithm to the only previously known approach by 
Kempe and McSherry HOl that can be used to estimate T^j^. in 

OiTmix) rounds{^ To compare, we note 
that when r^^^: = w(n^/^) the present algorithm is faster (assuming D is not too large). 
The work of |fT6i discusses spectral algorithms for enhancing the topology awareness, e.g., by identifying 
and assigning weights to critical links. However, the algorithms are centralized, and it is mentioned that 
obtaining efficient decentralized algorithms is a major open problem. Our algorithms are fully decen- 
tralized and based on performing random walks, and so more amenable to dynamic and self-organizing 
networks. 



Note that li2()l in fact do more and give a decentralized algorithm for computing the top k eigenvectors of a weighted adjacency 
matrix that runs in 0{Tmix log^ n) rounds if two adjacent nodes are allowed to exchange O(fc^) messages per round, where Tmix is 
the mixing time and n is the size of the network. 
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2 A Sublinear Time Distributed Random Walk Algorithm 
2.1 Description of the Algorithm 

We first describe tlie 0{i^^^ D^/^)-round algoritlim in ifTTl and then highlight the changes in our current 
algorithm. The current algorithm is randomized and uses several new ideas that are crucial in obtaining the 
new bound. 

The high-level idea is to perform "many" short random walks in parallel and later stitch them together as 
needed (see Figure [2] in Appendix). In the first phase of the algorithm Single-Random- Walk (we refer 
to Appendix for pseudocodes of all algorithms and subroutines), each node performs t] independent random 
walks of length A. (Only the destination of each of these walks is aware of its source, but the sources do not 
know destinations right away.) It is shown that this takes O(ryA) rounds with high probability. Subsequently, 
the source node that requires a walk of length £ extends a walk of length A by "stitching" walks. If the end 
point of the first A length walk is u, one of u's A length walks is used to extend. When at u, one of its A-length 
walk destinations are sampled uniformly (to preserve randomness) using Sample-Destination in 0{D) 
rounds. (We call such u and other nodes at the stitching points as connectors — cf. Algorithm 1.) Each 
stitch takes 0{D) rounds (via the shortest path). This process is extended as long as unused A-length walks 
are available from visited nodes. If the walk reaches a node v where all rj walks have been used up (which 
is a key difficulty), then Get-More-Walks is invoked. Get-More-Walks performs tj more walks of 
length A from v, and this can be done in 0(A) rounds. The number of times Get-More-Walks is invoked 
can be bounded by ^ in the worst case by an amortization argument. The overall bound on the algorithm is 

0(r/A + ID/X + jj). The bound of 0{£'^^^D^/^) follows from appropriate choice of parameters rj and A. 

The current algorithm uses two crucial ideas to improve the running time. The first idea is to bound 
the number of times any node is visited in a random walk of length £ (in other words, the number of times 
Get-More-Walks is invoked). Instead of the worst case analysis in |11|, the new bound is obtained by 
bounding the number of times any node is visited (with high probability) in a random walk of length £ on an 
undirected unweighted graph. The number of visits to a node beyond the mixing time can be bounded using 
its stationary probability distribution. However, we need a bound on the visits to a node for any ^-length 
walk starting from the first step. We show a somewhat surprising bound that applies to an ^-length (for 
£ = 0{m?)) random walk on any arbitrary (undirected) graph: no node x is visited more than 0{d{x)V£) 



times, in an ^-length walk from any starting node {d{x) is the degree of x) (cf. Lemma 2.6 1. Note that this 
bound does not depend on any other parameter of the graph, just on the (local) degree of the node and the 
length of the walk. This bound is tight in general (e.g., consider a line and a walk of length n). 

The above bound is not enough to get the desired running time, as it does not say anything about the 
distribution of connectors when we chop the length £ walk into £/X pieces. We have to bound the number 
of visits to a node as a connector in order to bound the number of times Get-More-Walks is invoked. 
To overcome this we use a second idea: Instead of nodes performing walks of length A, each such walk i 
is of length A + where is a random number in the range [0, A — 1]. Notice that the random numbers 
are independent for each walk. We show the following "uniformity lemma": if the short walks are now of a 
random length in the range of [A, 2A — 1], then if a node u is visited at most Nu times in an £ step walk, then 



the node is visited at most 0{Nu/X) times as an endpoint of a short walk (cf. Lemma 2.7 1. This modification 
to Single-Random-Walk allows us to bound the number of visits to each node (cf. Lemma [ZT] ). 

The change of the short walk length above leads to two modifications in Phase 1 of Single-Random- 
Walk and Get-More-Walks. In Phase 1, generating t] walks of different lengths from each node is 
straightforward: Each node simply sends t] tokens containing the source ID and the desired length. The 
nodes keep forwarding these tokens with decreased desired walk length until the desired length becomes 
zero. The modification of Get-More-Walks is tricker. To avoid congestion, we use the idea of reservoir 
sampling [32) . In particular, we add the following process at the end of the Get-More- WALKS algorithm 
in mil: 
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for i = to A — 1 do 

For each message, independently with probability , stop sending the message further 
and save the ID of the source node (in this event, the node with the message is the destina- 
tion). For messages M that are not stopped, each node picks a neighbor correspondingly 
and sends the messages forward as before, 
end for 

The reason it needs to be done this way is that if we first sampled the walk length r, independently for 
each walk, in the range [0, A — 1] and then extended each walk accordingly, the algorithm would need to 
pass r independently for each walk. This will cause congestion along the edges; no congestion occurs in the 
mentioned algorithm as only the count of the number of walks along an edge are passed to the node across 
the edge. Therefore, we need to decide when to stop on the fly using reservoir sampling. 

We also have to make another modification in Phase 1 due to the new bound on the number of visits. 
Recall that, in this phase, each node prepares r/ walks of length A. However, since the new bound of visits 



of each node x is proportional to its degree d{x) (see Lemma 2.6 1, we make each node prepare r]d{x) walks 



instead. We show that Phase 1 uses 0{r]X) rounds, instead of 0(-4^) rounds where 5 is the minimum degree 



in the graph (cf. Lemma 2.3 1 



To summarize, the main algorithm for performing a single random walk is Single-Random-Walk. 
This algorithm, in turn, uses Get-More- WALKS and Sample-Destination. The key modification is that, 
instead of creating short walks of length A each, we create short walks where each walk has length in range 
[A, 2A - 1]. To do this, we modify the Phase 1 of Single-Random-Walk and Get-More-Walks. 

We now state four lemmas which are similar to the Lemma 2.2-2.6 in ifTTl . However, since the algorithm 



here is a modification of that in |L1 IJ, we include the full proofs in Appendix A.2 
Lemma 2.1. Phase 1 finishes in 0{Xrj log n) rounds with high probability. 
Lemma 2.2. For any v, Get-More-Walks(u, rj, A) always finishes within 0(A) rounds. 
Lemma 2.3. Sample-Destination always finishes within 0{D) rounds. 

Lemma 2.4. Algorithm SAMPLE-DESTlNATlONfv) (cfi Algorithm^ returns a destination from a random 
walk whose length is uniform in the range [A, 2A — 1]. 

2.2 Analysis 

The following theorem states the main result of this Section. It states that the algorithm Single-Random- 
Walk correctly samples a node after a random walk of £ steps and the algorithm takes, with high probability, 
O (^VJD^ rounds where D is the diameter of the graph. Throughout this section, we assume that I is 

O(m^), where m is the number of edges in the network. If i is $7(m^), the required bound is easily achieved 
by aggregating the graph topology (via upcast) onto one node in 0{m + D) rounds (e.g., see L29il ). The 
difficulty lies in proving for i = 0{m'^). 

Theorem 2.5. For any £, Algorithm Single-Random-Walk (cf AlgorithmU^ solves 1-RW-DoS (the 
Single Random Walk Problem) and, with probability at least 1 — ^, finishes in O yV^D^ rounds. 

We prove the above theorem using the following lemmas. As mentioned earlier, to bound the number of 
times Get-More-Walks is invoked, we need a technical result on random walks that bounds the number 
of times a node will be visited in a ^-length random walk. Consider a simple random walk on a connected 
undirected graph on n vertices. Let d{x) denote the degree of x, and let m denote the number of edges. Let 
Nf{y) denote the number of visits to vertex y by time t, given the walk started at vertex x. Now, consider k 
walks, each of length £, starting from (not necessary distinct) nodes xi, X2, • • • , x^. We show a key technical 



lemma (proof in Appendix |A.4i that applies to a random walk on any graph: With high probability, no vertex 



y is visited more than 24:d{x)\/k£ + 1 log n + k times. 
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Lemma 2.6. For any nodes xi,X2, • • • , Xk, and i = 0{m?), 



Y>i{3y s.t. ^ A^f {y) > 2Ad{x) \^ kl + llogn + k) <l/n. 

i=l 

This lemma says that the number of visits to each node can be bounded. However, for each node, we are 
only interested in the case where it is used as a connector. The lemma below shows that the number of visits 
as a connector can be bounded as well; i.e., if any node Vi appears t times in the walk, then it is likely to 
appear roughly t/X times as connectors. 

Lemma 2.7. For any vertex v, if v appears in the walk at most t times then it appears as a connector node 
at most t(log n)^/A times with probability at least 1 — 1/ 'n?. 

Intuitively, this argument is simple, since the connectors are spread out in steps of length approximately 
A. However, there might be some periodicity that results in the same node being visited multiple times but 
exactly at A-intervals. This is where we crucially use the fact that the algorithm uses walks of length A + r 
where r is chosen uniformly at random from [0, A — 1]. The proof then goes via constructing another process 
equivalent to partitioning the I steps in to intervals of A and then sampling points from each interval. We 
analyze this by carefully constructing a different process that stochastically dominates the process of a node 
occurring as a connector at various steps in the ^-length walk and then use a Chernoff bound argument. The 



detailed proof is presented in Appendix A. 3 



Now we are ready to prove Theorem 2.5 



Proof of Theorem 2.5 First, we claim, using Lemma 2.6 and 2.7 that each node is used as a connector node 



at most ^^°'(^)^('°g") times with probability at least 1 — 2/n. To see this, observe that the claim holds if 
each node x is visited at most t{x) = 2Ad{x)\/i + 1 log n times and consequently appears as a connector 
node at most t{x) (log n)^/A times. By Lemma 2.6 the first condition holds with probability at least 1 — 1/n. 



By Lemma |2.7| and the union bound over all nodes, the second condition holds with probability at least 
1 — 1/n, provided that the first condition holds. Therefore, both conditions hold together with probability at 
least 1 — 2/n as claimed. 



Now, we choose i] = 1 and A = 24\/ZD(logn)^. By Lemma 2.1 Phase 1 finishes in 0{Xi]) = 0{VJD) 
rounds with high probability. For Phase 2, Sample-Destination is invoked 0(f) times (only when we 
stitch the walks) and therefore, by Lemma 2.3 contributes O(^) = 0{^/JD) rounds. Finally, we claim 
that Get-More- Walks is never invoked, with probability at least 1 — 2/n. To see this, recall our claim 
above that each node is used as a connector node at most ^'^^(^)^(^°g") times. Moreover, observe that we 

have prepared this many walks in Phase 1; i.e., after Phase 1, each node has rjXd{x) = ^^'^(^)^^('°g ") short 
walks. The claim follows. 

Therefore, with probability at least 1 — 2/n, the rounds are 0{ViD) as claimed. □ 

Regenerating the entire random walk: It is important to note that our algorithm can be extended to re- 
generate the entire walk. As described above, the source node obtains the sample after a random walk of 
length £. In certain applications, it may be desired that the entire random walk be obtained, i.e., every node 
in the £ length walk knows its position(s) in the walk. This can be done by first informing all intermediate 
connecting nodes of their position (since there are only 0(\/Z) such nodes). Then, these nodes can regenerate 
their 0{V£) length short walks by simply sending a message through each of the corresponding short walks. 
This can be completed in 0{^/ID) rounds with high probability. This is because, with high probability, 
Get-More-Walk will not be invoked and hence all the short walks are generated in Phase 1. Sending a 
message through each of these short walks (in fact, sending a message through every short walk generated in 
Phase 1) takes time at most the time taken in Phase 1, i.e., 0{VID) rounds. 
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2.3 Extension to Computing k Random Walks 

We now consider the scenario when we want to compute k walks of length i from different (not neces- 
sary distinct) sources si,S2, ■ ■ ■ , Sfc- We show that Single-Random-Walk can be extended to solve this 
problem. Consider the following algorithm. 

Many-Random-Walks: Let A = {2A^/kW + 1 logn + k){\ognf and r/ = 1. If A > £ then run the 
naive random walk algorithm, i.e., the sources find walks of length £ simultaneously by sending tokens. 
Otherwise, do the following. First, modify Phase 2 of Single-Random-Walk to create multiple walks, 
one at a time; i.e., in the second phase, we stitch the short walks together to get a walk of length £ starting 
at si then do the same thing for S2, S3, and so on. We state the theorem below and the proof is placed in 



Appendix A.5 



Theorem 2.8. Many- Random-Walks ^nw/ie* in O \ min{\/kiD + k,k + £)j rounds with high proba- 
bility. 

3 Lower bound 

In this section, we show an almost tight lower bound on the time complexity of performing a distributed 
random walk. At the end of the walk, we require that each node in the walk should know its correct position(s) 



among the i steps. We show that any distributed algorithm needs at least ^ (^y j rounds, even in graphs 
with low diameter. Note that Q,{D) is a lower bound lITTl . Also note that if a source node wants to sample 
k destinations from independent random walks, then ^{k) is also a lower bound as the source may need to 

receive i^{k) distinct messages. Therefore, for k walks, the lower bound we show is + k + D) 

rounds. (The rest of the section omits the Q,{k + D) term.) In particular, we show that there exists a n-node 



graph of diameter O(logn) such that any distributed algorithm needs at least ^^(y y^^) time to perform a 

walk of length n. Our lower bound proof makes use of a lower bound for another problem that we call as the 
Path Verification problem defined as follows. Informally, the Path Verification problem is for some node v to 
verify that a given sequence of nodes in the graph is a valid path of length i. 

Definition 3.1 (Path- Verification Problem). The input of the problem consists of an integer £, a graph 
G = {V, E), and I nodes w 1, ^2, ■■■,V£ in G. To be precise, each node v-i initially has its order number i. 

The goal is for some node v to "verify" that the above sequence of vertices forms an ^-length path, i.e., 
if (wj, fi+i) forms an edge for all 1 < i < ^ — 1. Specifically, v should output "yes" if the sequence forms 
an ^-length path and "no" otherwise. 

We show a lower bound for the Path Verification problem that applies to a very general class of verifi- 
cation algorithms defined as follows. Each node can (only) verify a segment of the path that it knows either 
directly or indirectly (by learning form its neighbors), as follows. Initially each node knows only the trivial 
segment (i.e. the vertex itself). If a vertex obtains from its neighbor a segment and it has already 

verified segment [12,32] that overlaps with [ii, ji] (say, ii < i2 < ji < 32) then it can verify a larger interval 
([^i;j2])- Note that a node needs to only send the endpoints of the interval that it already verifies (hence 
larger intervals are better). (See Figure [T]in the Appendix for an example.) The goal of the problem is that, 
in the end, some node verifies the entire segment [1, We would like to determine a lower bound for the 
running time of any distributed algorithm for the above problem. 

A lower bound for the Path Verification problem, implies a lower bound for the random walk problem 
as well. The reason is as follows. Both problems involve constructing a path of some specified length i. 
Intuitively, the former is a simpler problem, since we are not verifying whether the local steps are chosen 
randomly, but just whether the path is valid and is of length I. On the other hand, any algorithm for the random 
walk problem (including our algorithm of Section|2]l, also solves the Path Verification problem, since the path 
it constructs should be a valid path of length £. It is straightforward to make any distributed algorithm that 
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computes a random walk to also verify that indeed the random walk is a valid walk of appropriate length. 
This is essential for correctness, as otherwise, an adversary can always change simply one edge of the graph 
and ensure that the walk is wrong. 

In the next section we first prove a lower bound for the Path Verification problem. Then we show the 
same lower bound holds for the random walk problem by giving a reduction. 

3.1 Lower Bound for the Path Verification Problem 

The main result of this section is the following theorem. 

Theorem 3.2. For every n, and i < n there exists a graph Gn ofQ{n) vertices and diameter 0(log n), and 
a path P of length i such that any algorithm that solves the PATH- VERIFICATION problem on Gn and P 

requires more than k rounds, where k = \J^^- 

The rest of the section is devoted to proving the above Theorem. We start by defining G„. 

Definition 3.3 (Graph Gn)- Let k' be an integer such that is a power of 2 and k' /2 < Ak < k'. Let n' be 
such that n' > n and k' divides n'. We construct G„ having {n' + 2k' — 1) = 0{n) nodes as follows. First, 
we construct a path P = viV2---Vn'- Second, we construct a binary T having k' leaf nodes. Let ui,U2, ■■■,Uk' 
be its leaves from left to right. Finally, we connect P with T by adding an edge UiVj^'+i for every i and j. 
We will denote the root of T by x and its left and right children by I and r respectively. Clearly, G„ has 
diameter 0(log n). We then consider a path of length £ = 0(n). If required n can always be made larger by 
connecting dummy vertices to the root of T. (The resulting graph G„ is as in Figure[3]in the Appendix.) □ 

To prove the theorem, let A be any algorithm for the Path-Verification problem that solves the 
problem on Gn in at most k' rounds. We need some definitions and claims to prove the theorem. 

Definitions of left/right subtrees and breakpoints. Consider a tree T' obtained by deleting all edges in P. 
Notice that nodes Vjk'j^i, for all j and i < k' /2 are in the subtree of T' rooted at I and all remaining points 
are in the subtree rooted at r. For any node v, let sub{v) denote the subtree rooted at node v. (Note that 
suh{v) also include nodes in the path P.) We denote the set of nodes that are leaves of sub{l) by L (i.e., 
L = sub{l) n P) and the set of nodes that are leaves in sub{r) by R. 

Since we consider an algorithm that takes at most k rounds, consider the situation when the algorithm is 
given k rounds for free to communicate only along the edges of the path P at the beginning. Since L and R 
consists of every k' /2 vertices in P and k' /2 > 2k, there are some nodes unreachable from L by walking on 
P for k steps. In particular, all nodes of the form Vjj^u^^i /2+fc+i> for J> not reachable from L. We call 
such nodes breakpoints for sub{l). Similarly all nodes of the form Vjk'+k+i, for all j, are not reachable from 
R and we call them the breakpoints for sub{r). (See Figure[4]in the Appendix.) 

Definitions of path-distance and covering. For any two nodes u and v in T' (obtained from G„ by deleting 
edges in P), let c(n, v) be a lowest common ancestor of u and v. We define path_dist{u, v) to be the number 
of leaves of subtree of T rooted at c{u, v). Note that the path-distance is defined between any pair of nodes in 



Gn but the distance is counted using the number of leaves in T (which excludes nodes in P). (See Figure 5(a) 
in Appendix.) 

We also introduce the notion of the path-distance covered by a message. For any message m, the path- 
distance covered by m is the maximum path-distance taken over all nodes that have held the message m. 
That is, if m covers some nodes v'-^,v'2, ■■■,v'^ then the path-distance covered by m is the number of leaves in 
the subtrees of T rooted by v'^, f^. Note that some leaves may be in more than one subtrees and they 
will be counted only once. Our construction makes the right and left subtrees have a large number of break 



points, as in the following lemma. (Proof can be found in Appendix B.l ) 



Lemma 3.4. The number of breakpoints for the left subtree and for the right subtree are at least ^ each. 
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The reason we define these breakpoints is to show that the entire information held by the left subtree has 
many disjoint intervals, and same for the right subtree. This then tells us that the left subtree and the right 
subtree must communicate a lot to be able to merge these intervals by connecting/communicating the break 
points. To argue this, we show that the total path distance (over all messages) is large, as in the following 



lemma. (Proof is in Appendix B.2 ) 



Lemma 3.5. For algorithm A to solve Path- VERIFICATION problem, the total path-distance covered by all 
messages is at least n. 

These messages can however be communicated using the tree edges as well. We bound the maximum 
communication that can be achieved across sub{l) and sub{r) indirectly by bounding the maximum path- 



distance that can be covered in each round. In particular, we show the following lemma. See Figure 5(c) and 
proof in the Appendix. 

Lemma 3.6. In k rounds, all messages together can cover at most a path-distance of 0{k'^ log k). 
We now describe the proof of the main theorem using these three claims. 



Proof of Theorem p!2] Use Lemmas [33] and [3^61 we know that if A solves PATH- VERIFICATION, then it 
needs to cover a path.dist of n, but in k rounds it can only cover a path_dist of 0(fc^ log k). But this is 
o(n) since k = -i/ j-^, contradiction. □ 



log n 

3.2 Reduction to Random Walk Problem 

We now discuss how the lower bound for the Path Verification problem implies the lower bound of the 
random walk problem. The main difference between Path-Verification problem and the random walk 
problem is that in the former we can specify which path to verify while the latter problem generates different 
path each time. We show that the "bad" instance (G„ and P) in the previous section can be modified so 
that with high probability, the generated random walk is "hard" to verify. The theorems below are stated for 
i length walk/path instead of n as above. As previously stated, if it is desired that i be o(n), it is always 
possible to add dummy nodes. 

Theorem 3.7. For any n, there exists a graph Gn o/S(n) vertices and diameter 0{\ogn), and I = Q{n) 
such that, with high probability, a random walk of length £ needs ^{-J rounds. 



Proof. Theorem |3.2| can be generalized to the case where the path P has infinite capacity, as follows. 

Theorem 3.8. For any n and t = 0(n), there exists a graph Gn of 0{n) vertices and diameter O(logn), 
and a path P of length i such that any algorithm that solves the PATH- VERIFICATION problem on Gn and P 



requires more than ^{y rounds, even if edges in P have large capacity (i.e., one can send larger sized 
messages in one step ). 



Proof. This is because the proof of Theorem |3.2| only uses the congestion of edges in the tree T (imposed 
above P) to argue about the number of rounds. □ 

Now, we modify Gn to G'^ as follows. Recall that the path P in Gn has vertices vi,V2, ■■■,Vn'- For 
each i = 1,2, n', we define the weight of an edge (vj, Uj+i) to be (2n)^* (note that weighted graphs are 
equivalent to unweighted multigraphs in our model). By having more weight, these edges have more capacity 
as well. However, increasing capacity does not affect the claim as shown above. Observe that, when the walk 
is at the node vi, the probability of walk will take the edge (vi, f j+i) is at least 1 — ^. Therefore, P is the 
resulting random walk with probabi lity a t least 1 — 1/n. When the random walk path is P, it takes at least 



rounds to verify, by Theorem 



3.8 



This completes the proof. We remark that this construction requires 



exponential in n number of edges (multiedges). For the distributed computing model, this only translates to 
a larger bandwidth. The length £ is still comparable to the number of nodes. □ 
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4 Applications 

In this section, we present two applications of our algorithm. 
4.1 A Distributed Algorithm for Random Spanning Tree 

We now present an algorithm for generating a random spanning tree (RST) of an unweighted undirected 
network in 0{y/mD) rounds with high probability. The approach is to simulate Aldous and Broder's LLJJ 
RST algorithm which is as follows. First, pick one arbitrary node as a root. Then, perform a random walk 
from the root node until all nodes are visited. For each non-root node, output the edge that is used for its first 
visit. (That is, for each non-root node v, if the first time v is visited is t then we output the edge (u, v) where 
u is the node visited at time t — 1.) The output edges clearly form a spanning tree and this spanning tree is 
shown to come from a uniform distribution among all spanning trees of the graph mill- The expected time 
of this algorithm is the expected cover time of the graph which is shown to be 0{mD) (in the worst case, 
i.e., for any undirected, unweighted graph) by Aleniunas et al. [2]. 

This algorithm can be simulated on the distributed network by our random walk algorithm as follows. 
The algorithm can be viewed in phases. Initially, we pick a root node arbitrarily and set £ = n.ln each phase, 
we run logn (different) walks of length £ starting from the root node (this takes 0{VID) rounds using our 
distributed random walk algorithm). If none of the O(logn) different walks cover all nodes (this can be 
easily checked in 0{D) time), we double the value of £ and start a new phase, i.e., perform again log n walks 
of length £. The algorithm continues until one walk of length £ covers all nodes. We then use such walk to 
construct a random spanning tree: As the result of this walk, each node knows its position(s) in the walk (cf. 



Section 2.2 1, i.e., it has a list of steps in the walk that it is visited. Therefore, each non-root node can pick an 
edge that is used in its first visit by communicating to its neighbors. Thus at the end of the algorithm, each 
node can know which of its adjacent edges belong to the output tree. (An additional 0{n) rounds may be 
used to deliver the resulting tree to a particular node if needed.) 

We now analyze the number of rounds in term of r, the expected cover time of the input graph. The 
algorithm takes O(logT) phases before 2t < £ < 4r, and since one of logn random walks of length 2t 
will cover the input graph with high probability, the algorithm will stop with £ < At with high probability. 
Since each phase takes 0{V£D) rounds, the total number of rounds is 0{V'tD) with high probability. Since 
T = 0{mD), we have the following theorem. 

Theorem 4.1. The algorithm described above generates a uniform random spanning tree in 0{y/rriD) 
rounds with high probability. 

4.2 Decentralized Estimation of Mixing Time 

We now present an algorithm to estimate the mixing time of a graph from a specified source. Throughout 
this section, we assume that the graph is connected and non-bipartite (the conditions under which mixing 
time is well-defined). The main idea in estimating the mixing time is, given a source node, to run many 
random walks of length £ using the approach described in the previous section, and use these to estimate the 
distribution induced by the ^-length random walk. We then compare the distribution at length £, with the 
stationary distribution to determine if they are close, and if not, double I and retry. For this approach, one 
issue that we need to address is how to compare two distributions with few samples efficiently (a well-studied 
problem). We introduce some definitions before formalizing our approach and theorem. 

Definition 4.2 (Distribution vector). Let -Kxit) define the probability distribution vector reached after t steps 
when the initial distribution starts with probability 1 at node x. Let tt denote the stationary distribution vector. 

Definition 4.3 (r^'(e) and r^^^, mixing time for source x). Define r^(e) = mint : ||7ra;(t) — 7r||i < e. Define 
C.. = r-{l/2e). 

The goal is to estimate r^j^. Notice that the definition of r^-^ is consistent due to the following standard 
monotonicity property of distributions (proof in th appendix). 
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Lemma 4.4. ||7r^(t + 1) — 7r||i < ||7ra;(t) — 7r||i. 

To compare two distributions, we use the technique of Batu et. al. f6l to determine if the distributions are 
e-near. Their result (slightly restated) is summarized in the following theorem. 

Theorem 4.5 ([61). For any e, given 0{n^^^poly{e^^)) samples of a distribution X over [n], and a specified 
distribution Y, there is a test that outputs PASS with high probability if \X — Y\i < ^^j^^^ , and outputs 
FAIL with high probability if\X — Y\i>6e. 

We now give a very brief description of the algorithm of Batu et. al. |61 to illustrate that it can in fact be 
simulated on the distributed network efficiently. The algorithm partitions the set of nodes in to buckets based 
on the steady state probabilities. Each of the 0{'n}/'^poly{e~^)) samples from X now falls in one of these 
buckets. Further, the actual count of number of nodes in these buckets for distribution Y are counted. The 
exact count for Y for at most 0{Tn}/^poly{e^^)) buckets (corresponding to the samples) is compared with 
the number of samples from X; these are compared to determine if X and Y are close. We refer the reader 
to their paper [6] for a precise description. 

Our algorithm starts with £ = 1 and runs K = 0{^/n) walks of length £ from the specified source x. As 
the test of comparison with the steady state distribution outputs FAIL (for choice of e = l/12e), i is doubled. 
This process is repeated to identify the largest £ such that the test outputs FAIL with high probability and the 
smallest £ such that the test outputs PASS with high probability. These give lower and upper bounds on the 
required t^-^ respectively. Our resulting theorem is presented below and the proof is placed in the appendix. 

Theorem 4.6. Given a graph with diameter D, a node x can find, in 0{'n}/'^ + n^/^y^Dr^(e)) rounds, a 
time f^i^ such that r^^^ < f^^^ < r^(e), where e = ggiaeVniogn - 

Suppose our estimate of r;^-^ is close to the mixing time of the graph defined as Tmix = max^; r^j^, then 
this would allow us to estimate several related quantities. Given a mixing time Tmix, we can approximate 
the spectral gap (1 — A2) and the conductance ($) due to the known relations that < Tmix < and 
9(1 - A2) < $ < e(Vl - A2) as shown in Ell. 
5 Concluding Remarks 

This paper makes progress towards resolving the time complexity of distributed computation of random 
walks in undirected networks. The dependence on the diameter D is still not tight, and it would be interesting 
to settle this. There is also a gap in our bounds for performing k independent random walks. Further, we 
look at the CONGEST model enforcing a bandwidth restriction and minimize number of rounds. While our 
algorithms have good amortized message complexity over several walks, it would be nice to come up with 
algorithms that are round efficient and yet have smaller message complexity. 

We presented two algorithmic applications of our distributed random walk algorithm: estimating mixing 
times and computing random spanning trees. It would be interesting to improve upon these results. For 
example, is there a 0{-sJ~t^~. + -n}/^) round algorithm to estimate r^; and is there a 0{n) round algorithm 
for RST? 

There are several interesting directions to take this work further. Can these techniques be useful for esti- 
mating the second eigenvector of the transition matrix (useful for sparse cuts)? Are there efficient distributed 
algorithms for random walks in directed graphs (useful for PageRank and related quantities)? Finally, from 
a practical standpoint, it is important to develop algorithms that are robust to failures and it would be nice to 
extend our techniques to handle such node/edge failures. 
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Appendix 

A Omitted Proofs of Section |2] (Upper Bound) 
A.l Algorithm descriptions 

The main algorithm for performing a single random walk is described in S INGLE-RANDOM- WALK (of. 
Algorithm [T]l. This algorithm, in turn, uses Get-More-Walks (of. |2]and Sample-Destination (cf. [3). 

Notice that in Line |9] in Algorithm |2j the walks of length A are extended further to walks of length A + r 
where r is a random number in the range [0, A — 1]. We do this by extending the A-length walks further, and 
probabilistically stopping each walk in each of the next i steps (for < i < A — 1) with probability 
The reason it needs to be done this way is because if we first sampled r, independently for each walk, in the 
range [0, A — 1] and then extended each walk accordingly, the algorithm would need to pass r independently 
for each walk. This will cause congestion along the edges; no congestion occurs in the mentioned algorithm 
as only the count of the number of walks along an edge are passed to the node across the edge. 

Algoritlim 1 Single-Random-Walk(s, £) 
Input: Starting node s, and desired walk length i. 
Output: Destination node of the walk outputs the ID of s. 

Pliase 1: (Eacli node v performs rjy = rj deg{v)) random walks of length A + rj where rj (for each 1 < 
i < 7?) is chosen independently at random in the range [0, A — 1].) 

I: Let Vmax = niaxi<j<^ ri, the random numbers chosen independently for each of the rj^ walks. 

2: Each node x constructs rjx messages containing its ID and in addition, the i-th message contains the 

desired walk length of A + r^. 
3: for i = 1 to A + rmax do 

4: This is the i-th iteration. Each node v does the following: Consider each message M held by v and 
received in the (i — l)-th iteration (having current counter i — 1). If the message M's desired walk 
length is at most i, then v stored the ID of the source {v is the desired destination). Else, v picks a 
neighbor u uniformly at random and forward M to u after incrementing its counter. 
{Note that any iteration could require more than 1 round.} 

5: end for 

Phase 2: (Stitch @{£/X) walks, each of length in [A, 2A - 1]) 

I: The source node s creates a message called "token" which contains the ID of s 
2: The algorithm generates a set of connectors, denoted by C, as follows. 
3: Initialize C = {s} 

4: while Length of walk completed is at most ^ — 2A do 
5: Let V be the node that is currently holding the token. 

6: V calls S AMPLE-DESTlNATlON(f ) and let v' be the returned value (which is a destination of an unused 

random walk starting at v of length between A and 2A — 1.) 
7: if v' = NULL (all walks from v have already been used up) then 
8: V calls Get-M0RE-Walks(i;, A) (Perform 0(//A) walks of length A starting at v) 
9: V calls Sample-Destination(t;) and let v' be the returned value 
10: end if 

II: V sends the token to v' 

12: C = CU{v} 

13: end while 

14: Walk naively until £ steps are completed (this is at most another 2A steps) 
15: A node holding the token outputs the ID of s 
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Algorithm 2 Get-More-WalksCu, A) 



(Starting from node v, perform [i/X\ number of random walks, each of length A + rj where is chosen 
uniformly at random in the range [0, A — 1] for the i-th walk.) 

1: The node v constructs [i/X\ (identical) messages containing its ID. 

2: for i = 1 to A do 

3: Each node u does the following: 

4: - For each message M held by u, pick a neighbor z uniformly at random as a receiver of M. 
5: - For each neighbor z of u, send ID of v and the number of messages that z is picked as a receiver, 
denoted by c(n, v). 

6: - For each neighbor z of u, upon receiving ID of v and c{u, v), constructs c{u, v) messages, each 

contains the ID of v. 
7: end for 

{Each walk has now completed A steps. These walks are now extended probabilistically further by r 
steps where each r is independent and uniform in the range [0, A — 1].} 
8: for i = to A - 1 do 

9: For each message, independently with probability -j^, stop sending the message further and save the 
ID of the source node (in this event, the node with the message is the destination). For messages M 
that are not stopped, each node picks a neighbor correspondingly and sends the messages forward as 
before. 
10: end for 

11: At the end, each destination knows the source ID as well as the length of the corresponding walk. 



A.2 Proofs of Lemma 2.1 2.2 2.3 and 2.4 



Proof of Lemma 2.1 This proof is a slight modification of the proof of Lemma 2.2 in fTTl, where it is shown 
that each node can perform r/ walks of length A together in 0{Xr] log n) rounds with high probability. We 
extend this to the following statement. 

Each node v can in fact perform rj deg{v) of length 2A and still finish in 0{Xr] log n) rounds. 

The desired claim will follow immediately because each node v performs t] deg(t;) of length at most X in 
Phase 1. 

Consider the case when each node v creates 7]deg{v) > r] messages. For each message M, any j = 
1, 2, A, and any edge e, we define X^^(e) to be a random variable having value 1 if M is sent through 
e in the j^'^ iteration (i.e., when the counter on M has value j — 1). Let X^{e) = Sjvf message -^lii^)- 
compute the expected number of messages that go through an edge, see claim below. 

Claim A.l. For any edge e and any j, E,[X^ (e)] = 2r]. 

Proof. Assume that each node v starts with r]deg{v) messages. Each message takes a random walk. We 
prove that after any given number of steps j, the expected number of messages at node v is still r]deg{v). 
Consider the random walk's probability transition matrix, call it A. In this case Au = u for the vector u 
having value ^^^^ where m is the number of edges in the graph (since this u is the stationary distribution of 
an undirected unweighted graph). Now the number of messages we started with at any node i is proportional 
to its stationary distribution, therefore, in expectation, the number of messages at any node remains the same. 

To calculate E[X^ (e)], notice that edge e will receive messages from its two end points, say x and y. The 
number of messages it receives from node x in expectation is exactly the number of messages at x divided 
by deg(x). The claim follows. □ 
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Algorithm 3 Sample-DestinationCi;) 



Input: Starting node v. 

Output: A node sampled from among the stored walks (of length in [A, 2A — 1]) from v. 

Sweep 1: (Perform BFS tree) 

1: Construct a Breadth-First-Search (BFS) tree rooted at v. While constructing, every node stores its par- 
ent's ID. Denote such tree by T. 

Sweep 2: (Tokens travel up the tree, sample as you go) 

1: We divide T naturally into levels through D (where nodes in level D are leaf nodes and the root node 
s is in level 0). 

2: Tokens are held by nodes as a result of doing walks of length between A and 2 A — 1 from v (which is done 
in either Phase 1 or Get-More- WALKS (cf. Algorithm |2]l) A node could have more than one token. 

3: Every node u that holds token(s) picks one token, denoted by do, uniformly at random and lets cq denote 
the number of tokens it has. 

4: for i = D down to do 

5: Every node u in level i that either receives token(s) from children or possesses token(s) itself do the 
following. 

6: Let u have tokens do, di, d2, ■ ■ ■ , dg, with counts cq, ci, C2, . . . , Cg (including its own tokens). The 
node V samples one of do through dq, with probabilities proportional to the respective counts. That is, 
for any I < j < q, dj is sampled with probability ^^^^^^ . 

7: The sampled token is sent to the parent node (unless already at root), along with a count of co + ci + 
. . . + Cq (the count represents the number of tokens from which this token has been sampled). 

8: end for 

9: The root output the ID of the owner of the final sampled token. Denote such node by Ud- 

Sweep 3: (Go and delete the sampled destination) 

1: V sends a message to (e.g., via broadcasting). Ud deletes one token of v it is holding (so that this 
random walk of length A is not reused/re-stitched). 



By Chemoff 's bound (e.g., in 1261 Theorem 4.4.]), for any edge e and any j, 

F[X\e) > 4r/logn] < 2-^'°s" = n'^ 

It follows that the probability that there exists an edge e and an integer 1 < j < A such that (e) > 4r/ log n 
is at most \E{G)\Xn~^ < ^ since \E{G)\ < in? and A < £ < n (by the way we define A). 

Now suppose that X^{e) < 4r/logn for every edge e and every integer j < A. This implies that we can 
extend all walks of length i to length i + 1 in 4r/ log n rounds. Therefore, we obtain walks of length A in 
4Xr] log n rounds as claimed. □ 



Proof of Lemma [272] The argument is exactly the same as the proof of Lemma 2.4 in ifTTT]. That is, there is 



no congestion. We only consider longer walks (length at most 2A — 1 ) this time. The detail of the proof is 
as follows. 

Consider any node v during the execution of the algorithm. If it contains x copies of the source ID, for 
some X, it has to pick x of its neighbors at random, and pass the source ID to each of these x neighbors. 
Although it might pass these messages to less than x neighbors, it sends only the source ID and a count 
to each neighbor, where the count represents the number of copies of source ID it wishes to send to such 
neighbor. Note that there is only one source ID as one node calls Get-More-Walks at a time. Therefore, 
there is no congestion and thus the algorithm terminates in 0(A) rounds. □ 



Proof of Lemma [ZJ] This proof is exactly the same as the proof of Lemma 2.5 in ifTTll . 
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Constructing a BFS tree clearly takes only 0{D) rounds. In the second phase where the algorithm wishes 
to sample one of many tokens (having its ID) spread across the graph. The sampling is done while retracing 
the BFS tree starting from leaf nodes, eventually reaching the root. The main observation is that when a node 
receives multiple samples from its children, it only sends one of them to its parent. Therefore, there is no 
congestion. The total number of rounds required is therefore the number of levels in the BFS tree, 0{D). The 
third phase of the algorithm can be done by broadcasting (using a BFS tree) which needs 0{D) rounds. □ 



Proof of Lemma 2.4 The claim follows from the correctness of Sample-Destination that the algorithm 
samples a walk uniformly at random and the fact that the length of each walk is uniformly sampled from the 
range [A, 2A — 1]. The first part is proved in Lemma 2.6 in Das Sarma et al. ifTTl and included below for 
completeness. We now prove the second part. 

To show that each walk length is uniformly sampled from the range [A, 2A — 1], note that each walk can 
be created in two ways. 

1. It is created in Phase 1. In this case, since we pick the length of each walk uniformly from the length 
[A, 2A — 1], the claim clearly holds. 

2. It is created by Get-More- Walk. In this case, the claim holds by the technique of reservoir sam- 
pling: Observe that after the A*'* step of the walk is completed, we stop extending each walk at any 
length between A and 2A — 1 uniformly. To see this, observe that we stop at length A with probability 
1/A. If the walk does not stop, it will stop at length A + 1 with probability This means that the 
walk will stop at length A + 1 with probability x = . Similarly, it can be argue that the 
walk will stop at length i for any i G [A, 2 A — 1] with probability ^. 

We now show the proof of Lemma 2.6 (with slight modification) in Das Sarma et al. for completeness. 

Lemma A.2 (Lemma 2.6 in [11]). Algorithm SAMPLE-DESTlNATlONfwj (cf Algorithm^, for any node v, 
samples a destination of a walk starting at v uniformly at random. 

Proof. Assume that before this algorithm starts, there are t (without loss of generality, let t > 0) "tokens" 
containing ID of v stored in some nodes in the network. The goal is to show that Sample-Destination 
brings one of these tokens to v with uniform probability. For any node u, let T„ be the subtree rooted at u 
and let Su be the set of tokens in T^. (Therefore, = T and \Sv\ = t.) 

We claim that any node u returns a destination to its parent with uniform probability (i.e., for any tokens 
X e Su, Pr[u returns x] is l/l^ul (if > 0)). We prove this by induction on the height of the tree. This 
claim clearly holds for the base case where u is a leaf node. Now, for any non-leaf node u, assume that the 
claim is true for any of its children. To be precise, suppose that u receives tokens and counts from q children. 
Assume that it receives tokens di,d2, ■■.,dq and counts ci,C2,...,Cq from nodes ui,U2, ■■■,Uq, respectively. 
(Also recall that do is the sample of its own tokens (if exists) and cq is the number of its own tokens.) By 
induction, dj is sent from uj to u with probability l/jSuj |, for any I < j < q. Moreover, cj = \Suj \ for any 
j. Therefore, any token dj will be picked with probability j^—^ x e ~ ^ claimed. 

The lemma follows by applying the claim above to v. □ 



□ 



A.3 Proof of Lemma 



2.7 



Proof. Intuitively, this argument is simple, since the connectors are spread out in steps of length approxi- 
mately A. However, there might be some periodicity that results in the same node being visited multiple 
times but exactly at A-intervals. This is where we crucially use the fact that the algorithm uses walks of 
length A + r where r is chosen uniformly at random from [0, A — 1]. 
We prove the lemma using the following two claims. 
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Claim A.3. Consider any sequence A of numbers ai, ...,a^ of length (! . For any integer A', let B he a 
sequence CA'+n ■, Q^2A'+ri+r2 > •■■) fliA'+ri+...+ri 7 ••• where ri, for any i, is a random integer picked uniformly 
from [0, A' — 1]. Consider another subsequence of numbers C of A where an element in C is picked from 
from "every A' numbers" in A; i.e., C consists of [^'/-^'J numbers ci, C2, ... where, for any i, Ci is chosen 
uniformly at random from a(j„i);)^/+i, a(j_i);^/_|_2, fljA'- Then, Pr[C contains ai^, ai,^, aj^}] = Pr[B = 
{oii , , . . . , Oi J] /or any set {ui^^ai^, }. 

Proof. First consider a subsequence C of A. Numbers in C are picked from "every A' numbers" in A; 
i.e., C consists of [i'/X'\ numbers ci,C2,... where, for any i, ci is chosen uniformly at random from 
a(i-i)\'+i,a'{i-i)\'+2: ■ Obscrve that \C\ > \B\. In fact, we can say that "C contains B"; i.e., 

for any sequence of k indexes ii, ^2, ■■■,ik such that A' < ij+i — ij < 2A' — 1 for all j, 

Pr[B = {ajj,ai2, ...,ajj] = Pr[C contains {oij , a^j, Oj J]. 

To see this, observe that B will be equal to {oj^ , Ojj, Oifc} only for a specific value of ri, r2, r^. Since 
each of ri, r2, is chosen uniformly at random from [1, A'], Pr[B = {ai^, a^j, aj^. }] = A'-'^. More- 
over, the C will contain a,^ , ai2, ai^} if and only if, for each j, we pick ai- from the interval that contains 
it (i.e., from a(j'-i)A'+ii 0(i'-i)A'+2i Oi'AS for some i'). (Note that a^j, Ojj, ... are all in different intervals 
because ij+i — ij > A'forallj.) Therefore, Pr[C contains Oj^ , Ojj, Oi;,}] = A'"'^. □ 

Claim A.4. Consider any sequence A of numbers oi, of length i'. Consider subsequence of numbers 
C of A where an element in C is picked from from "every A' numbers" in A; i.e., C consists of Y^' /\'\ 
numbers ci, C2, ... where, for any i, Ci is chosen uniformly at random from a(j_i)A'+i) fl{i-i)A'+2i f^iA'-- 
For any number x, let rix be the number of appearances of x in A; i.e., rix = \{i \ ai = Then, for any 
R > Qrix/ A', X appears in C more than R times with probability at most 2^^. 

Proof. For i = 1,2,..., [£' /X'\, let Xi be a 0/1 random variable that is 1 if and only if Cj = x and X = 

Xi. That is, X is the number of appearances of x in C. Clearly, E[X] = Ux/X'. Since Xj's are 
independent, we can apply the Chernoff bound (e.g., in ll26l Theorem 4.4.]): For any R > 6E[X] = Grix/X', 

Pr[X <R]> 2^^. 

The claim is thus proved. □ 

Now we use the claim to prove the lemma. Choose £' = i and A' = A and consider any node v that 
appears at most t times. The number of times it appears as a connector node is the number of times it appears 
in the subsequence B described in the claim. By applying the claim with R = i(logn)^, we have that v 
appears in B more than i (log n)^ times with probability at most as desired. □ 



A.4 Proof of Lemma 



2.6 



We start with the bound of the first and second moment of the number of visits at each node by each walk. 
Proposition A.5. For any node x, node y and t = 0{m?), 

mny)] < 8d(y)^/tTT, and Y.[{Nf{y)f] < E[iV-(y)] + 128 d\y) (t + 1) . (1) 

To prove the above proposition, let P denote the transition probability matrix of such a random walk and 
let TT denote the stationary distribution of the walk, which in this case is simply proportional to the degree of 
the vertex, and let TTmin = niin^,. 7r(x). 

The basic bound we use is the following estimate from Lyons (see Lemma 3.4 and Remark 4 in 1*251). 
Let Q denote the transition probability matrix of a chain with self-loop probablity a > 0, and with c = 
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min {tt{x)Q{x, y) : x ^ y and Q{x, y) > 0} . Note that for a random walk on an undirected graph, c = 
For > a positive integer (denoting time) , 



^Q'{x,y) I . 
' 1 < mm 



' 7r(y) I- lac^/fcTT' 2a2c2(A: + l) 

For A; < fim? for a sufficiently small constant /3, and small a, the above can be simplified to the following 
bound; see Remark 3 in [i25il . 

QHx,y)<^^='^. (3) 

Note that given a simple random walk on a graph G, and a corresponding matrix P, one can always 
switch to the lazy version Q = {I + P)/2, and interpret it as a walk on graph G' , obtained by adding self- 
loops to vertices in G so as to double the degree of each vertex. In the following, with abuse of notation we 
assume our P is such a lazy version of the original one. 

Proof. Let Xq, Xi, . . . describe the random walk, with Xi denoting the position of the walk at time i > 0, 
and let \a denote the indicator (0-1) random variable, which takes the value 1 when the event A is true. In 
the following we also use the subscript x to denote the fact that the probability or expectation is with respect 
to starting the walk at vertex x. First the expectation. 



mf{y)] = ^.[Y,i{x.=y}] = Y.p\x,y) 



i=0 j=0 

t 



< 4d(T/) , (using the above inequality Q) 



i=0 

< 8d{y)Vt + l. 

Abbreviating Nf{y) as Nt{y), we now compute the second moment: 

t t 



i=0 i=0 



t t 



= [E l|x,=j;} + 2 E l{X,=y,X,=y} 

i=0 0<i<j<t 
t 

= E[Nt{y)] + 2 E Fr{Xi = y, Xj = y). 

o<j<i<t 

To bound the second term on the right hand side above, consider for < i < j: 

Fv{Xi = y, Xj =y) = Pr(X, = y) Pr(X, = y\X, = y) 

= P'^{x,y) P^^^{y,y), due to the Markovian property 
M(y) Adiy) . _ 
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Thus, 



0<i<j<i 0<i<t ^ ^ 0<j-i<t-i ^ 



< 32d\y) y Vt - i + 1 



0<i<t 

< 6M\y)it + l), 

which yields the claimed bound on the second moment in the proposition. □ 

Using the above proposition, we bound the number of visits of each walk at each node, as follows. 
Lemma A.6. For t = 0{m'^) and any vertex y ^ G, the random walk started at x satisfies: 

1 



Fi{N^{y) > 24 d{y) + llogn) < 



n 



2 



Proof. First, it follows from the Proposition that 

Pr(iVr(y) > 2 • 12 d{y)Vt + l) < ^ . (4) 
This is done by using the standard Chebyshev argument that for i? > 0, Pr(A't(y) > B) < Pr (^Nf{y) > 

For any r, let L^{y) be the time that the random walk (started at x) visits y for the r*^ time. Observe that, 
for any r, Nf{y) > r if and only if L^{y) < t. Therefore, 

Vv{Nf{y)>r)=Y>v{L^.{y)<t). (5) 

Let r* = 24 d{y)^/tTl. By ^ and Q, Pr(L^. (y) < t) < I . We claim that 

-j^ \ log n 



Pr(L?*i„,„(y)<t)< - =-^. (6) 



n 



To see this, divide the walk into log n independent subwalks, each visiting y exactly r* times. Since the event 
L^t iog„(y) < t impUes that all subwalks have length at most t, Q follows. Now, by applying ^ again, 

Pr(iVr(y) > r*logn) = Pr(L^;i„gJy) < t) < ^ 

as desired. 

□ 

We now extend the above lemma to bound the number of visits of all the walks at each particular node. 
Lemma A.7. For 7 > 0, and t = 0{m^), and for any vertex y £ G, the random walk started at x satisfies: 

k ^ 
Pr(^iVf'(y) > 24 d{y)V kt + llogn + k) < ^ . 

i=l ^ 



20 



Proof. First, observe that, for any r. 



k 

Pr(^iVf (y) >r-k)< Pr[iV^j(y) > r]. 
1=1 

To see this, we construct a walk W of length kt starting at y in the following way: For each i, denote a walk 
of length t starting at xi by Wi. Let Tj and r^' be the first and last time (not later than time t) that Wi visits 
y. Let Wl be the subwalk of Wi from time Tj to r^'. We construct a walk W by stitching •••> 
together and complete the rest of the walk (to reach the length kt) by a normal random walk. It then follows 
that the number of visits to y by Wi , W2, ■ ■ ■ , (excluding the starting step) is at most the number of visits 
to y by W . The first quantity is Yl\=i ^t' iu) ~ ^- (The term '—A;' comes from the fact that we do not count 
the first visit to y by each Wi which is the starting step of each W/.) The second quantity is N^^{y). The 
observation thus follows. 
Therefore, 

k ^ 
Pr(^7Vf (y) > 24 d{y)Vkt + Hog n + k) < Fi {N^^{y) > 24 d{y)Vkt + Hog n) < 

i=l 



where the last inequality follows from Lemma A.6 □ 



Lemma 2.6 follows immediately from Lemma A.7 by union bounding over all nodes. 



A.5 Proof of Theorem 



2.8 



Proof. Firs t, co nsider the case where A > ^. In this case, mm{Vk£D + k, Vk£ + k + £) = d{VM + k + e). 



By Lemma 2.6 eac h no de x will be visited at most 0{d{x){\'ki + k)) times. Therefore, using the same 



argument as Lemma 2. 1 the congestion is 0{\'ki + k) with high probability. Since the dilation is i, Many- 
Random-Walks takes 0{\/ki + + rounds as claimed. Since 1\/kl < k + i, this bound reduces to 

0{k + i). 



Now, consider the other case where \ < £. In this case, mm{VkiD + k, + k + i) = d{\^kW + 
k). Phase 1 takes d{Xr]) = d{Vk£D + k). The stitching in Phase 2 takes d{k£D/\) = d{Vk£D). 
Moreover, by Lemma [Z6l Get-More- WALKS will never be invoked. Therefore, the total number of rounds 
is d{Vk£D + A;) as claimed. □ 

B Omitted Proofs of Section |3] (Lower Bound) 
B.l Proof of Lemma 



3.4 



Proof After the first k free rounds, consider the intervals that the left subtree can have, in the best case. 
Recall that these k rounds allowed communication only along the path. The pathAist of any node in L from 
the breakpoints of sub{L) along the path is at least A; + 1. □ 



B.2 Proof of Lemma 3.5 



Proof. First, notice that each left breakpoint is at a path-distance of A;+ 1 from every node in the right subtree. 
That is, path.dist{u, L) = path.dist{v, i?) = /c + 1 for all u ^ Bi and all v G Br. 

Each breakpoint needs to be combined into one interval in the end. However, there could be one interval 
that is communicated from the sub{l) to the sub{r) (or vice versa) such that it connects several breakpoints. 
We show that this cannot happen. Consider all the breakpoints v ^ Bi U Br. 
Definition of scratching. 

Let us say that we scratch owf the breakpoints from the list k + 1, k' /2 + k + l, k' + k + l, k' + k' /2 + k + l, 
2k' + k + l, ... that get connected when an interval is communicated between sub{l) and sub{r). We scratch 
out a breakpoint if there is an interval in the graph that contains it and both (or one in case of the first 
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and last breakpoints) its adjacent breakpoints. For example, if the left subtree has intervals [1, k' /2 + k] and 
[k' /2 + k + 2,k' + k' /2 + k + l] and the right subtree has [k + 2,k' + k] and the latter interval is communicated 
to a node in the left subtree, then the left subtree is able to obtain the merged interval [l,k' + k' /2 + k + 1] 
and therefore breakpoints k + 1 and A;'/2 + A; + 1 are scratched out. 

Claim B.l. At most 0(1) breakpoints can be scratched out with one message/interval communicated between 

sub{r) and sub{l) 

Proof. We argue that with the communication of one interval across the left and right subtrees, at most 
4 breakpoints that have not been scratched yet can get scratched. This follows from a simple inductive 
argument. Consider a situation where the left subtree has certain intervals with all overlapping intervals 
already merged, and similarly right subtree. Suppose an interval X is communicated between suh{r) and 
sub{l), one of the following cases arise: 

• Z contains one breakpoint: Can be merged with at most two other intervals. Therefore, at most three 
breakpoints can get scratched. 

• X contains two breakpoints: Can get connected with at most two other intervals and therefore at most 
four breakpoints can get scratched. 

• X contains more than two breakpoints: This is impossible since there are at most two breakpoints in each 
interval, its left most and right most numbers (by definition of scratching). 

This completes the proof of the claim. □ 



The proof now follows from Lemma 3.4 For any breakpoint h, let be the set of messages that 
represents an interval containing b while b is still unscratched. If b is in sub{l) and gets scratched because of 
the combination of some intervals in sub{r), then we claim that M^, has covered a path-distance of at least 
k. (Define the path-distance covered by M5 by the total path-distance covered by all messages in M;,.) This 
is because b = Vi (say), being a breakpoint in sub{l) has i equal to (fc + 1 mod k'). Therefore, & is at a 
path distance of at least k from any node in R. Consequently, 6 is at a path-distance of at least k from any 
node in sub{r). Since there are 0(^) breakpoints, and for any interval to be communicated across the left 
and right subtree, a path-distance of k must be covered, in total, 0(n) path-distance must be covered for all 
breakpoints to be scratched. This follows from three main observations: 

• As shown above, for any breakpoint to be scratched, an interval with a breakpoint must be communicated 
from sub{l) to sub{r) or vice versa (thereby all messages m containing the breakpoint together covering 
a path-distance of at least k) 

• Any message/interval with unscratched breakpoints has at most two unscratched breakpoints 



As shown in Claim B.l at most four breakpoints can be scratched when two intervals are merged. 



The proof follows. (Also see Figure 5(b) for the idea of this proof.) □ 



B.3 Proof of Lemma 3.6 



Proof. We consider the total number of messages that can go through nodes at any level of the graph, starting 
from level to level log k under the congest model. 

First notice that if a message is passed at level i of the tree, this can cover a path_dist of at most 2*. This 
is because the subtree rooted at a node at level i has 2* leaves. Further, by our construction, there are 2^°^^'^'^"* 
nodes at level i. Therefore, all nodes at level i together, in a given round of A can cover a dist — path, path 
distance, of at most 2^2^°^'^'''^-' = Ak + 2. Therefore, over k rounds, the total pathAist that can be covered 
in a single level is k{k'). Since there are 0(log k) levels, the total path_dist that can be covered in k rounds 



over the entire graph is 0{k log k). (See Figure 5(c) ) □ 
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C Omitted Proofs of Section 4.2 (Mixing Time) 



C.l Brief description of algorithm for Theorem 



4.5 



The algorithm partitions the set of nodes in to buckets based on the steady state probabilities. Each of the 
0{n^/^poly{e^^)) samples from X now falls in one of these buckets. Further, the actual count of number of 
nodes in these buckets for distribution Y are counted. The exact count for Y for at most 0{n^^'^poly{e^^)) 
buckets (corresponding to the samples) is compared with the number of samples from X; these are compared 
to determine if X and Y are close. We refer the reader to their paper [6 ] for a precise description. 



C.2 Proof of Lemma 



4.4 



Proof. The monotonicity follows from the fact that 1 1 Ax 1 1 1 < 1 1 x 1 1 1 where A is the transpose of the transition 
probability matrix of the graph and x is any probability vector. That is, A{i,j) denotes the probability of 
transitioning from node j to node i. This in turn follows from the fact that the sum of entries of any column 
of ^is 1. 

Now let vr be the stationary distribution of the transition matrix A. This implies that if I is e-near mixing, 
then \\A^u — vr||i < e, by definition of e-near mixing time. Now consider — 7r||i. This is equal to 

ll^i+i^ — j47r||i since A-k = vr. However, this reduces to — vr)!!! < e. It follows that {i + 1) is 

e-near mixing. □ 



C.3 Proof of Theorem 4.6 



Proof. For undirected unweighted graphs, the stationary distribution of the random walk is known and is 
'^^2m node i with degree deg{i), where m is the number of edges in the graph. If a source node in 
the network knows the degree distribution, we only need 0{v}/'^poly{e~^)) samples from a distribution to 
compare it to the stationary distribution. This can be achieved by running MultipleRandomWalk to 
obtain K = 0{n^/'^poly{e^^)) random walks. We choose e = l/12e. To find the approximate mixing time, 
we try out increasing values of / that are powers of 2. Once we find the right consecutive powers of 2, the 
monotonicity property admits a binary search to determine the exact value for the specified e. 

The result in [6] can also be adapted to compare with the steady state distribution even if the source does 
not know the entire distribution. As described previously, the source only needs to know the count of number 
of nodes with steady state distribution in given buckets. Specifically, the buckets of interest are at most 
0(n^/^po/y(e^^)) as the count is required only for buckets were a sample is drawn from. Since each node 
knows its own steady state probability (determined just by its degree), the source can broadcast a specific 
bucket information and recover, in 0{D) steps, the count of number of nodes that fall into this bucket. Using 
the standard upcast technique previously described, the source can obtain the bucket count for each of these 
at most 0(n^/^po/y(e^^)) buckets in 0{n^^'^poly{e^^) + D) rounds. 

We have shown previously that a source node can obtain K samples from K independent random walks 
of length £ in d{K + VKID) rounds. Setting K = 6{n^/'^poly{e^^) + D) completes the proof. □ 

D Figures 



23 




1,5 2 1 [1,2] 

(a) (b) (c) 



Figure 1: Example of path verification problem, (a) In the beginning, we want to verify that the vertices containing 
numbers 1..5 form a path. (In this case, they form a path a, 6, c, d, a.) (b) One way to do this is for a to send 1 to 6 and 
therefore b can check that two vertices a and b corresponds to label 1 and 2 form a path. (The interval [1, 2] is used to 
represent the fact that vertices corresponding to numbers 1, 2 are verified to form a path.) Similarly, c can verify [3, 5]. 
(c) Finally, c combine [1, 2] with [3, 5] and thus the path corresponds to numbers 1, 2, 5 is verified. 




Figure 2: Figure illustrating the Algorithm of stitching short walks together. 
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Figure 4: Breakpoints, (a) L and R consist of every other A;'/2 vertices in P. (Note that we show the vertices I and r 
appear many times for the convenience of presentation.) (b) v^i /2+A;+i ™d ^/c'+fe' /2+fc+i (nodes in black) are two of 
the breakpoints for L. Notice that there is one breakpoint in every cormected piece of L and R. 
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Figure 5: (a) Path distance between 1 and 2 is the number of leaves in the subtree rooted at 3, the lowest common 
ancestor of 1 and 2. (b) For one unscratched left breakpoint, fc'/2 + fc + 1 to be combined with another right breakpoint 
A: + 1 on the left, k' /2 + k + 1 has to be carried to L by some intervals. Moreover, one interval can carry at most two 
unscratched breakpoints at a time, (c) Sending a message between nodes on level i and i — 1 can increase the covered 
path distance by at most 2\ 
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